Automated optimisation of solubility and conformational stability of antibodies and proteins

Biologics, such as antibodies and enzymes, are crucial in research, biotechnology, diagnostics, and therapeutics. Often, biologics with suitable functionality are discovered, but their development is impeded by developability issues. Stability and solubility are key biophysical traits underpinning developability potential, as they determine aggregation, correlate with production yield and poly-specificity, and are essential to access parenteral and oral delivery. While advances for the optimisation of individual traits have been made, the co-optimization of multiple traits remains highly problematic and time-consuming, as mutations that improve one property often negatively impact others. In this work, we introduce a fully automated computational strategy for the simultaneous optimisation of conformational stability and solubility, which we experimentally validate on six antibodies, including two approved therapeutics. Our results on 42 designs demonstrate that the computational procedure is highly effective at improving developability potential, while not affecting antigen-binding. We make the method available as a webserver at www-cohsoftware.ch.cam.ac.uk.

The manuscript by Rosace, Bennett et al. presents CamSol combination -a computational pipeline to co-optimize solubility and thermodynamic stability of antibodies (and potentially other proteins) by predicting combinations of favorable mutations. Computational pipelines like this one exist and are of broad interest for rational optimization of antibodies. Two of the essential tools (i.e., CamSol and FoldX) used in the presented approach were published previously and have proven to yield reasonable predictions. To me, the novelty of CamSol combinations comes from the input of phylogenetic information to increase the accuracy of the predictions. The prediction of combinations of mutations is also a step forward. The authors demonstrate with several proteins and dozens of mutants that most of the computational predictions are reasonable. The paper is well written, the method is available via a website which I found user-friendly. The data output from the server is very clear. In my opinion, better validation and some additional explanations of the method will be highly beneficial for the manuscript and will increase the value to justify publication in this journal. I listed some comments (not in the order of importance): • Studies on similar approaches combining colloidal and conformation stability predictions, e.g., TANGO and FoldX (PMID: 2832291), Aggrescan 3D and FoldX (PMID: 31049593), have to be mentioned and the advantages of CamSol combination over these methods should be clearly explained in the discussion. Would the automated Aggrescan 3D/FoldX prediction of mutations overlap to some extent with the individual mutations from Camsol combination?
• When using different databases (e.g., clinical-stage, OAE, abYsis) for the phylogenetic analysis, some of the predicted mutations overlap, but others do not. Non-expert users of CamSol combination will be faced with the challenge of deciding which databases to use. A better discussion (or maybe a table) highlighting the strong and weak points of using the different databases will be very useful. Comparisons and experimental validations for several antibody mutants optimized via CamSol combination using different sequence databases will be valuable.
• FoldX is known to work best with crystal structures as an input. However, the CamSol combination will most likely be used on homology models of antibodies. Can the authors show an unbiased comparison of the quality of the predictions when using crystal structures versus homology models? Does the server predict the same mutations?
• The melting temperature was used as a proxy for the conformational stability, but measurement of the dG of unfolding to validate the predictions would be more accurate. Can the authors provide dG values on selected mutants and the corresponding wild-types?
• It is not surprising that the ammonium sulphate precipitation did not yield reliable results to validate the predictions of the nanobodies. The cross-interaction chromatography might show some correlations with the predictions, but CIC is not really a measurement for colloidal stability. Can you use a light scattering approach to measure the second osmotic virial coefficient which is the actual thermodynamic parameter that should correlate with colloidal stability? Alternatively, affinity-capture self-interaction spectroscopy is a good proxy for colloidal stability.
• In the structures that I optimized with CamSol combinations, there were several interesting predictions. It appears that the colloidal stability of antibodies with basic isoelectric points is often increased by introducing more positive charges (which are known to drive polyspecificity). I did not observe a case where the method chose to optimize an antibody by introducing negative charges to significantly reduce the isoelectric point (although it is known that antibodies with acidic isoelectric points can exhibit high solubility and stability). What is the possible reason for this?
• A recent paper presented an in vitro approach to select stabilizing mutations in antibodies (PMID: 32286330). I wonder if the computational optimization with CamSol combination will predict some of the mutations reported in this paper.
• The validations for IgGs were performed by measurements on the scFv. Reformatting an scFv into a Fab can have unexpected effects on protein solubility/stability. Can the authors validate their predictions for several mutants produced as full-length IgGs?
• It will be highly valuable if the authors show correlations between the CamSol combination predictions and the production yields of the proteins.
• There is no validation of the method with proteins different than antibodies, although the title implies that the approach also works with other proteins. There is plenty of data on the effect of mutations on enzymatic stability/solubility (except for the used ProTherm database). The authors could make predictions for such enzymes and compare them to published datasets. Alternatively, the authors could select a challenging model protein (different than an antibody) for optimization by CamSol combination and substantiate the predictions with own experimental data to demonstrate the broad applicability value of their approach.

Reviewer #1 (Remarks to the Author):
This is an interesting manuscript describing a computational pipeline for simultaneously improving folded state stability and solubility of proteins. Importantly, the effectiveness of the pipeline is demonstrated experimentally with different antibody and fragment structures. It is therefore valid for the authors to claim that the method has the potential for impact in biopharmaceutical, and other, applications. The pipeline largely builds on existing methodology for solubility (CamSol) and stability (Fold-X) predictions, but confirming their applicability when computational and experimental vales are compared. There is novelty in the combination of these methods with multiple sequence alignment data (for evolutionary suitability of mutations), and some tweaks of how to put it together. The pipeline is delivered via a web server.
I agree with the authors that the work is a potentially valuable tool for areas of biopharms and bitotech that seek (and likely to be common) improvements in conformational and colloidal stability at the same time. I feel that the manuscript is rather wordy, but I'm not clear how otherwise it could be constructed to improve the message, perhaps more detail into Suppl Info. Overall it is clear, and the figures good.
We are extremely grateful to the reviewer for the positive assessment of our work, and for the constructive feedback and questions. Some questions follow.
1. With regard to biopharmaceuticals, the authors mention the driver for increased stability at high concentrations, for storage and administration. Could they discuss the scale of improvement (e.g. mg/ml concentration) that their redesigns could typically achieve? This is a very relevant question that unfortunately cannot be answered in a simple way. The answer depends very strongly on the protein under scrutiny, and especially on the chosen formulation condition. For example, the mg/mL gain afforded by mutations that affect electrostatic interactions will strongly depend on the ionic strength and pH of the formulation buffer. In our experiments we have selected a suite of widely used developability assays, whose measurements (e.g., midpoint of AMS precipitation or CIC retention time) are well-known to correlate with solubility and hence with the possibility of formulating to higher concentrations.
The key advantage of these assays, which is the reasons why they are so widely used, is that they require relatively low amounts of purified protein material, while measuring the improvement in mg/mL in a pharmaceutically relevant formulation would require hundreds of mgs of material (and typically a one-year incubation time). In the revised manuscript we better explain our choice of assay in the discussion section. Because of these challenges, we feel that this type of discussion on the mg/mL improvement is outside the scope of the current work.
2. Early in the manuscript, a class of sites labelled as near surface (and possibly involved in aggregation) are noted. How many such sites make it into the final designs, versus the surface sites? This answer also depends very strongly on the protein under scrutiny and on the maximum allowed number of mutations. In the revised manuscript (Section "(ii) Selection of candidate mutation sites") we have now added the paragraph: The typical number of mutation sites in each of these four groups strongly depends on the protein under scrutiny. The final report from the webserver contains a table with all identified candidate mutation sites, which include information on how each site was identified (column "identified from", see Supplementary Files 1 to 9). To show that, for each run, this information can be extracted from the final report of the webserver, by looking at the origin of the mutation sites that are contained in the final designs.
3. In the nanobody example, mutation to arginine is seen to be problematic which, as the authors note is a known phenomenon. How would a constraint to avoid this be added to he pipeline / should it? This is an excellent point; We have decided not to introduce constraints based on liabilities related to stickiness of arginine, chemical degradation hotspots, immunogenicity hotspots etc. as different research groups and pharma companies use their own preferred guidelines depending on the intended application of the therapeutic molecule. In the revised discussion section, we now added the sentence: At present, users have the option to exclude specific residues from the list of potential substitution targets. So, if any of the final designs contains liabilities, the calculation can be re-run by excluding the relevant residue(s) to get a different set of top-ranking designs (e.g., exclude arginine for specificity, asparagine for deamidation and glycosylation, etc.). Similarly, if any such liability is present in the WT protein, users can add the corresponding site as custom mutation site in the algorithm, so that mutations predicted to improve stability and solubility will be suggested to remove the liability. 4. For Fv example(s), there is a step of checking for interaction of a designed mutation with the antigen binding site, is this part of the pipeline or an add-on. If an add-on, how would it be put into the pipeline?
The antigen-contact analysis was carried out on the WT antibody structures, to identify all sites in contact with the antigen. Then, if any of these sites was mutated in a selected design, we would flag the mutation at that site as potentially disruptive of affinity. We have now made this clearer in the corresponding paragraph of the main text and section of the supplementary material. This analysis is not part of the automated pipeline, but it can be carried out with any standard software for structure visualisation (we have used UCSF Chimera, but Pymol, VMD, MOE and most available molecular visualisation software can readily be used to identify contacts between protein chains). Ultimately, this analysis is exactly what is done routinely to assign the antibody paratope from a bound structure.
5. Do we get sufficient feel for what the mutations are doing, at least in terms of Fold-X ddG? e.g. for the nanobody there look to be several charge changes, are these just improving plus/minus network (and then solubility of course too)? Whereas in the Fvs there seem to be more non-charged mutations -what are the anticipated basis of stability change? Is it easy to design proline changes -do these come from Fold-X and MSA together?
The answer is generally yes, but it's very mutation specific. As the reviewer pointed out, the manuscript is already rather wordy, so we feel that discussing the potential molecular basis underpinning the impact of each of the 33 different single amino acid substitutions we validated experimentally is outside the scope of this work. Indeed, some mutations contribute more to CamSol solubility, others more to FoldX energy, some to both. Each mutation does so in different way (for example, proline substitution do come from Fold-X and MSA together, and likely increase the conformational stability by lowering the entropy of the unfolded state).
Users interested to delve deeper into these aspects can do so directly from the webserver output.
In the revised manuscript, section "(iii) Single mutational scanning", we have now added: The final report from the webserver contains an extract of this table with those substitutions that improve the Mutation Score, and scatter plots showing predicted solubility and stability gains (see Supplementary Files 1 to 9). The full table with all explored substitutions is also provided by the webserver as a .csv file inside the output zip folder. Therein, users can find all details of each attempted mutation, including its calculated contributions to the FoldX total energy (e.g., electrostatics, hydrogen-bonds, solvation, etc.).
Reviewer #2 (Remarks to the Author): The authors report incremental improvements to an existing software for predicting mutations that improve protein solubility by incorporating a phylogenetic filter that identifies residues that are conserved amongst homologous proteins. By combining this with force field calculations that score stability, the resulting pipeline is expected to enable users to design proteins which have reduced propensity to aggregate. The authors have tested this by applying the algorithm to the sequences of three antibodies in scFvs format (VH-linker-VL) and three single domain nanobodies (VHH). The mutations predicted by the silico analyses have been comprehensively tested experimentally and show that the introduced changes increase thermal stability and solubility in some cases without compromising antigen binding activity. Given these use cases, the work is principally directed at industry users for increasing properties that favour the developability of antibodies as biopharmaceuticals. Although solubility/stability are key parameters, other considerations such as chemical modifications (e.g. deamidation and glycation) are also important in selecting antibody development candidates. The authors indicate that incorporating screening for such sequence-based liabilities forms part of future developments. Although the output of the pipeline provides choices, the experimental scientist using this pipeline still needs to make selections based on other knowledge and experience.
We thank the reviewer for the positive assessment of the work and for recognising the comprehensiveness of our experimental validation.
We agree that our manuscript represents a crucial first step towards the development of an increasingly more far-reaching framework for the optimisation of developability potential, which may include other aspects such as chemical and post-translational liabilities and immunogenicity. We have decided not to introduce constraints based on liabilities related to chemical degradation hotspots, immunogenicity hotspots etc. as different research groups and pharma companies use their own preferred guidelines depending on the intended application of the therapeutic molecule, i.e. some requirements for lyophilized vs. solution formulation, therapeutic vs. diagnostic application and dosage imposed constraints. However, we wish to emphasise that users expert in the field of biologic developability can already use the current implementation to remove such liabilities, albeit this is not fully automated. Sequence-based liabilities present in a WT protein (e.g. deamidation sites, PTM sites) can be entered in the algorithm as custom mutation sites, so that mutations that improve stability and solubility are suggested to remove these liabilities. Similarly, users have the option to exclude specific residues from the list of potential substitution targets. So, if any of the returned final designs contains liabilities, the calculation can be re-run by excluding the relevant residue(s) to get a different set of top-ranking designs (e.g., exclude arginine for specificity, asparagine for deamidation and glycosylation, etc.). In the revised manuscript (Algorithm section and Discussion section) we have added a few sentences to highlight these opportunities.
Specific comments 1. Antibodies are highly sensitive to concentration and are often required to be formulated at tens of mg/ml. The experimental work reported was carried out at protein concentrations 0.1-1.0 mg/ml. This raises the question of how valid the predictions are for higher antibody concentrations. This is an excellent point, and we agree with the reviewer that we did not justify very clearly our choice of experimental assays in the previous version of our manuscript. For our experiments we have selected a suite of widely used developability assays, whose measurements (e.g. midpoint of AMS precipitation, or CIC retention times) are well-known to be predictive of (that is correlated with) solubility and high concentration behaviour. In the discussion section of the revised manuscript, we have now added the sentences: These are widely used in vitro developability assays, whose measurements are wellknown to be predictive of solubility and high concentration behaviour 43,64,66,85,86 . The key advantage of these assays, which is the reasons why they are so widely used, is that they require relatively low amounts of purified protein material. We wish to point out that, conversely, measuring the long term-integrity of a pharmaceutically relevant formulation would require hundreds of mgs of material (and typically a ~1-year incubation time). There are many reviews that discuss these and other in vitro developability assays in depth (including a recent one from some of the authors of this work DOI: 10.1007/978-1-0716-1450-1_4). As an example, Jain et al. PNAS 2017 used these assays to characterise most antibodies that where in advanced clinical stages (including approved ones) at the time of their work. We cite these and several other papers on the topic in our manuscript.
2. There are a several other programmes that have been developed to address the developability of antibodies, recently reviewed in Akbar et la 2022 (DOI: 10.1080(DOI: 10. /19420862.2021. Although in a different context, the use of a phylogenetic filter in the prediction of mutations in proteins that will improve solubility has also been reported (DOI:10.1016/j.molcel.2016.06.012 DOI 10.1093/bioinformatics/btaa1071). It would be appropriate to comment on the relative merits of different but similar pipelines. We thank the reviewer for this comment. We have now added a few sentences to the discussion section mentioning the PROSS webserver and three other relevant methods, highlighting some key differences with our pipeline. We believe that an in-depth discussion on the relative merits on the different pipelines and on emerging Machine-Learning approaches would however be more appropriate for a review article, especially as our manuscript is already quite wordy (as pointed out by reviewer #1).
3. The title of the article indicates the applicability of the pipeline to proteins in general but only one example is provided and without any experimental validation. It is suggested that either, further examples are included together with at least an experimental study or that the title is made more specific. We thank the reviewer for raising this point, which was also raised by reviewer 3. One challenge we have faced when comparing to published work (besides the ProTherm FDR analysis already included in Fig. 1) is that our method automatically identifies specific mutations that should be performed, which are thus unlikely to be those that have been characterised in previous studies. In the revised manuscript, we have circumvented this challenge by benchmarking our predictions with published Deep Mutational Scanning (DMS) data for seven unrelated non-antibody proteins. Our analysis of these data shows that the false discovery rate of our computational procedure is low also for generic proteins, as poorly expressing variants and mutations that disrupt protein function are not shortlisted. See corresponding sections in the revised main text and supplementary method, as well as new figure S10.
4. The choice of nanobodies in the test set is interesting but given that these are typically very soluble and stable proteins, the need to improve this by mutagenesis is unlikely to be a widely adopted application of the software. We are aware that nanobodies have this reputation, and this is certainly true for many/most nanobodies. However, we find that this is not always the case, based on observations at both Novo Nordisk and University of Cambridge, and on several discussions with colleagues working elsewhere. Many nanobodies tend to form HMW (high molecular weight) aggregates in HPLC-SEC, and would therefore benefit from solubility improvement. Similarly, a study that analysed a large number (close to 100) of immune-system derived nanobodies found that their thermal stability varied widely, and that most aggregated irreversibly upon heating (Kunz et al. "The structural basis of nanobody unfolding reversibility and thermoresistance." 2018). It is also relatively well known that it is often a challenge to push the expression yields of some nanobodies to high levels, which can indicate sub-optimal stability and/or solubility. Perhaps, as nanobodies are still emerging as therapeutic molecules, these aspects still need to be systematically investigated. Moreover, nanobodies discovered from synthetic libraires (like Nb.B201 in our work) will be more likely to have sub-optimal stability and solubility, as they haven't undergone in vivo selection and maturation. Indeed, our algorithm could improve Nb.B201 very substantially (>13ºC gain in melting temperature, and >44s gain in CIC RT). Discovery from synthetic libraries offers distinct advantages over immunisation (e.g., it takes less time and money and affords a much easier antigen presentation, including of nonimmunogenic antigens), and therefore it's increasingly employed as a first choice to generate new nanobodies. 5. The run time for the software is admirably fast but log files when runs fail would benefit from some explanation for the non-specialist user. We have been working on improving this aspect, and since submitting the manuscript have already done several updates to the web server and especially its log. However, it remains difficult to anticipate all possible errors. We anticipate that as users start contacting us reporting errors, we will be able to increasingly improve the log messages.

Reviewer #3 (Remarks to the Author):
The manuscript by Rosace, Bennett et al. presents CamSol combination -a computational pipeline to co-optimize solubility and thermodynamic stability of antibodies (and potentially other proteins) by predicting combinations of favorable mutations. Computational pipelines like this one exist and are of broad interest for rational optimization of antibodies. Two of the essential tools (i.e., CamSol and FoldX) used in the presented approach were published previously and have proven to yield reasonable predictions. To me, the novelty of CamSol combinations comes from the input of phylogenetic information to increase the accuracy of the predictions. The prediction of combinations of mutations is also a step forward. The authors demonstrate with several proteins and dozens of mutants that most of the computational predictions are reasonable. The paper is well written, the method is available via a website which I found user-friendly. The data output from the server is very clear.
We would like to thank the reviewer for these positive comments on our work.
In my opinion, better validation and some additional explanations of the method will be highly beneficial for the manuscript and will increase the value to justify publication in this journal. I listed some comments (not in the order of importance): • Studies on similar approaches combining colloidal and conformation stability predictions, e.g., TANGO and FoldX (PMID: 2832291), Aggrescan 3D and FoldX (PMID: 31049593), have to be mentioned and the advantages of CamSol combination over these methods should be clearly explained in the discussion. Would the automated Aggrescan 3D/FoldX prediction of mutations overlap to some extent with the individual mutations from Camsol combination? We thank the reviewer for this comment. In the revised manuscript, we added some sentences to the discussion section mentioning the SolubiS webserver and Aggrescan 3D v. 2 (respectively Tango+FoldX and Aggrescan+FoldX), highlighting some key differences with our pipeline. We believe that an in-depth discussion on the relative merits on the different pipelines would however be more appropriate for a review article, especially as our manuscript is already quite wordy (as pointed out by reviewer #1). As suggested by the reviewer, we ran on the Aggrescan 3D/FoldX webserver the same pdb files (and same excluded sites) used in our work for Nb.B201, Adalimumab and %.,/0*-8&03,'( $9#1+031+042%/3)(2%3( "7+,&+ ,3 0/( 0) 4+( 1204(,/3 7( +%6( 53(' ,/ 4+( /(7 DMS data benchmark, see answer to the last point). Overall, Aggrescan 3D only suggests single mutations, and only to charged residues, without any PSSM-constraint. Hence, albeit there is some overlap in the identified mutation sites, especially those identified on the basis of solubility by our method, there is very little agreement between the two predictions. We include below tables with the mutations suggested by Aggrescan3D, with some comments for each mutation site for the Reviewer's consideration. However, we believe that adding these Aggrescan predictions and a corresponding discussion to the current manuscript would make it too wordy and possibly confusing to readers. • When using different databases (e.g., clinical-stage, OAE, abYsis) for the phylogenetic analysis, some of the predicted mutations overlap, but others do not. Non-expert users of CamSol combination will be faced with the challenge of deciding which databases to use. A better discussion (or maybe a table) highlighting the strong and weak points of using the different databases will be very useful. Comparisons and experimental validations for several antibody mutants optimized via CamSol combination using different sequence databases will be valuable. We are very grateful to the reviewer for pointing out this potential source of confusion. We have now amended the webserver interface to better guide the MSA selection (see screenshot below) and have added this summary paragraph to the relevant section of the supplementary information:

Nb.B201
In summary, the single-domain-VH MSA should be used for nanobodies or other singledomain antibodies. We recommend using the OAS-human MSA for all human antibodies and the OAS-mouse for all mouse antibodies, as these MSAs best recapitulate the diversity of the repertoires they represent. Finally, the post-phase-I MSA can be used in cases where there is a strong need to retain similarity with clinical antibody candidates, as, with 526 sequences, it currently has a more restricted diversity than the OAS-human MSA.
Revised webserver table, with OAS-human now used as default MSA: In our work, we have run the CamSol combination procedure on Adalimumab (Humira) using both the OAS-Human and the Post-Phase-I MSAs (see Table S3). We experimentally validated the best five-mutation design from each run. These two designs had 3 mutations in common (A23K and W53P on VH and A94P on VL), while the unique mutations were T52S and S55G from the post-phase-1 calculation and A40P and S49G from the OAS-Human calculation, all on the VH. Experimentally, we don't find much difference between these two designs, the one obtained from the OAS-human MSA had slightly higher thermal stability (by 0.9 ºC), perhaps reflecting the higher diversity, and hence larger 'allowed mutational space', of this MSA.
• FoldX is known to work best with crystal structures as an input. However, the CamSol combination will most likely be used on homology models of antibodies. Can the authors show an unbiased comparison of the quality of the predictions when using crystal structures versus homology models? Does the server predict the same mutations?
We thank the reviewer for rising this point, as it is likely that some users will use the method starting from modelled structures. We have now performed a new analysis where we compare the results of runs from crystal structures and corresponding models for 19 different antibodies.
To make this analysis as unbiased as possible, we selected 19 Fvs that belong to the test set of ImmuneBuilder, and hence were not employed for algorithm training. ImmuneBuilder is the software we have used to obtain the models. We find a very good agreement between predictions carried out starting from the model or from the corresponding structures. To avoid making our manuscript even more wordy, and as this is not the key focus of our work, we cover this new analysis in a small paragraph of the revised discussion section. However, al details of it and a more in-depth discussion of its results can be found in the revised Supplementary Information, and the results are plotted in the new Figure S11.
Overall, approaches to model protein and antibody structures are improving at an unprecedented rate. We expect that, in the near future, there won't be any difference between running our approach on a crystal structure or on a model.
• The melting temperature was used as a proxy for the conformational stability, but measurement of the dG of unfolding to validate the predictions would be more accurate. Can the authors provide dG values on selected mutants and the corresponding wild-types?
We have now carried out additional experiments of GdnCl chemical denaturation to measure the DG of unfolding. We did this for at least two variants per antibody (and 3 for Adalimumab and Nb.b201). As expected, we find that, even if DG measurements are affected by larger experimental uncertainties than Tm measurements, there is a perfect agreement in the stability rankings of mutational variants obtained with chemical and heat denaturation. These results are briefly touched on in the revised discussion section, and described in detail in the revised supplementary methods (see new Fig. S12 and Table S4).
• It is not surprising that the ammonium sulphate precipitation did not yield reliable results to validate the predictions of the nanobodies. The cross-interaction chromatography might show some correlations with the predictions, but CIC is not really a measurement for colloidal stability. Can you use a light scattering approach to measure the second osmotic virial coefficient which is the actual thermodynamic parameter that should correlate with colloidal stability? Alternatively, affinity-capture self-interaction spectroscopy is a good proxy for colloidal stability. We thank the reviewer for these suggestions. The AMS precipitation results we have included have very broad confidence intervals that overlap with each other for many variants. For this reason, we cannot obtain a reliable ranking of relative solubility for all designed variants. However, these results clearly show that the WT nanobody precipitates at a lower AMS concentration than the designed variants, thus supporting the effectiveness of the design. We agree that accurate light-scattering measurements of the second osmotic virial coefficient can be useful. However, such measurements require relatively large amount of purified protein material, as proteins needs to be concentrated to concentrations > 1 mg/mL (ideally to 10 mg/mL) and then titrated down. We could not obtain enough material to do these experiments (in addition to those we have done) with our mid-throughput mid-scale expression. We wish to point out that CIC measurements have been shown by many reports to correlate well with measurements of solubility, and in fact CIC is a widely used technique to assess antibody developability potential (see references in the main text).
AC-SINS is a great suggestion, as it would require limited material. However, whereas AC-SINS is well-established for the IgG antibody format relying on a specific anti-Fc antibody being conjugated to the gold nanoparticles, it is not well-validated for other molecular formats such as scFv and nanobodies, where a new capture approach would need to be developed. We are aware that promising data exist on Fab-SINS (Biophysical and Sequence-Based Methods for Identifying Monovalent and Bivalent Antibodies with High Colloidal Stability -PubMed 29154550) but still we do not believe that this method is optimal for the formats in scope here, as it would require extensive assay development and validation, which is outside of the scope of this work.
• In the structures that I optimized with CamSol combinations, there were several interesting predictions. It appears that the colloidal stability of antibodies with basic isoelectric points is often increased by introducing more positive charges (which are known to drive polyspecificity). I did not observe a case where the method chose to optimize an antibody by introducing negative charges to significantly reduce the isoelectric point (although it is known that antibodies with acidic isoelectric points can exhibit high solubility and stability). What is the possible reason for this?
the mutations the authors find with directed evolution may be predicted by our method, there is no reason to expect any statistically significant overlap. Importantly, in our previous work, we have already benchmarked CamSol solubility predictions and aggregation hotspot detection on these variants of MEDI1912, finding a perfect agreement (see Fig. 5 of Sormanni et al. 2017 10.1038/s41598-017-07800-w, where mAb1 and mAb2 are respectively MEDI578 and MEDI1912; panel C uses the same SEC-HPLC data that the authors of PMID 32286330 use in their Fig. 2). As we previously reported a benchmark of CamSol with these mAb variants, we believe that it wouldn't be suitable to report a very closely related analysis in the present work.
• The validations for IgGs were performed by measurements on the scFv. Reformatting an scFv into a Fab can have unexpected effects on protein solubility/stability. Can the authors validate their predictions for several mutants produced as full-length IgGs?
We regret that we are unable to carry out additional rounds of protein production in IgG format and subsequent characterisation. However, we are not aware of a single instance (published or in house) in which the stability or solubility ranking of different mutational variants of the same WT scFv was changed upon reformatting to Fab or full IgG. In other words, the measured property (e.g., the melting temperature) can change, in some cases substantially, when reformatting from scFv to Fab, but the ranking of mutational variants of the same WT scFv is always conserved (e.g., more stable scFv variants yield more stable Fab or IgG variants, provided that the constant domains are always the same). We expect that this will be especially true in our case, as most of our mutations are in the CDR regions and thus far away from the constant domains, and as our Fv regions differ from their WT by a maximum of only 5 mutations. One of the reasons we chose scFv initially is that, in this format, a single unfolding transition is observed facilitating the measurement of stability. Conversely, in IgG format, multiple transitions are typically observed corresponding to the unfolding of the different domains, which can substantially complicate the interpretation of stability measurements.
• It will be highly valuable if the authors show correlations between the CamSol combination predictions and the production yields of the proteins. We agree with the reviewer that expression yields are a relevant developability readout. However, in our work we have chosen to characterise experimentally an as high as possible number of designed variants. Therefore, we could express these only at relatively small scale and crucially only once. In our experience, expression yields can vary quite substantially from batch to batch under these expression conditions (most of the variability likely comes from the efficiency of the transient transfection), and we do not feel confident in making any claims or drawing any correlation from an N=1 expression experiment. Overall, from this one instance, we found that expression yields under our conditions did not vary much among designed variants of the same WT antibody. Furthermore, expression of early protein batches for research is often done with HEK cells, which is not always representative for development batches that are often expressed in CHO cells using different fermentation systems, and therefore we are not convinced that these data would add sufficient value.
• There is no validation of the method with proteins different than antibodies, although the title implies that the approach also works with other proteins. There is plenty of data on the effect of mutations on enzymatic stability/solubility (except for the used ProTherm database). The authors could make predictions for such enzymes and compare them to published datasets. Alternatively, the authors could select a challenging model protein (different than an antibody) for optimization by CamSol combination and substantiate the predictions with own experimental data to demonstrate the broad applicability value of their approach.
We thank the reviewer for raising this point. One challenge we have faced when comparing to published work (besides the ProTherm FDR analysis already included in Fig. 1) is that our method automatically identifies specific mutations that should be performed, which are thus unlikely to be those that have been characterised in previous studies. In the revised manuscript, we have circumvented this challenge by benchmarking our predictions with published Deep Mutational Scanning (DMS) data for seven unrelated non-antibody proteins. Our analysis of these data shows that the false discovery rate of our computational procedure is low also for generic proteins, as poorly expressing variants and mutations that disrupt protein function are not shortlisted. See corresponding sections in the revised main text and supplementary method, as well as new Figure S10.